Distributed Private Heavy Hitters 



Justin Hsu* Sanjeev Khanna^ Aaron Roth"'" 

February 23, 2012 

(n: 
^ . 

o- 

. Abstract 

O ■ In this paper, we give efficient algorithms and lo"wer bounds for solving the heavy hitters 

1^ , problem "while preserving differential privacy in the fully distributed local model. In this model, 

there are n parties, each of "which possesses a single element from a universe of size N. The 
CN| , heavy hitters problem is to find the identity of the most common element shared amongst the 

■ n parties. In the local model, there is no trusted database administrator, and so the algorithm 
must interact "with each of the n parties separately, using a differentially private protocol. We 

■ give tight information-theoretic upper and lo"wer bounds on the accuracy to "which this problem 
\ can be solved in the local model (giving a separation bet"ween the local model and the more 

common centralized model of privacy), as "well as computationally efficient algorithms even in 
Q I the case "where the data universe N may be exponentially large. 

^ ! 1 Introduction 

o: 

I ■ Consider the problem of a "website administrator "who "wishes to kno"w "what his most common traffic 

Qj^ I sources are. Each of n visitors arrives "with a single referring site: the name of the last "website 

that she visited, which is dra"wn from a vast universe N of possible referring sites {N here is the set 

■ of all "websites on the internet). There is value in identifying the most popular referring site (the 
^SI . heavy hitter): the site administrator may be able to better tailor the content of his "webpage, or 

I better focus his marketing resources. On the other hand, the identity of each individual's referring 

^ ■ site might be embarrassing or other"wise revealing, and is therefore private information. We can 

^ . therefore imagine a "world in which this information must be treated "privately." Moreover, in 

I this situation, visitors are communicating directly "with the servers of the "websites that they visit: 

i.e. there is no third party who might be trusted to aggregate all of the referring website data 
and provide privacy preserving statistics to the "website administrator. In this setting, how well 
can the "website administrator estimate the heavy hitter "while being able to provide formal privacy 
guarantees to his visitors? 

This situation can more generally be modeled as the heavy hitters problem under the constraint 
of differential privacy. There are n individuals i G [n] each of whom is associated with an element 
Vi £ N of some large data universe A^. The heavy hitter is the most frequently occurring element 
X € N among the set {vi, . . . ,Vn}, and we would like to be able to identify that element, or one 
that occurs almost as frequently as the heavy hitter. Moreover, we wish to solve this problem 
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while preserving differential privacy in the fuhy distributed (local) model. We define this formally 
in section [21 but roughly speaking, an algorithm is differentially private if changes to the data of 
single individuals only result in small changes in the output distribution of the algorithm. Moreover, 
in the fully distributed setting, each individual (who can be viewed as a database of size 1) must 
interact with the algorithm independently of all of the other individuals, using a differentially 
private algorithm. This is in contrast to the more commonly studied centralized model, in which a 
trusted database administrator may have (exact) access to all of the data, and coordinate a private 
computation. 

We study this problem both from an information theoretic point of view, and from the point 
of view of efficient algorithms. We say that an algorithm for the private heavy hitters problem is 
efficient if it runs in time poly (n, log iV): i.e. polynomial in the database size, but only polylog- 
arithmic in the universe size (i.e. in what we view as the most interesting range of parameters, 
the universe may be exponentially larger than the size of the database). We give tight information 
theoretic upper and lower bounds on the accuracy to which the heavy hitter can be found in the 
private distributed setting (separating this model from the private centralized setting), and give 
several efficient algorithms which achieve good, although information-theoretically sub-optimal ac- 
curacy guarantees. We leave open the question of whether efficient algorithms can exactly match 
the information theoretic bounds we prove for the private heavy hitters problem in the distributed 
setting. 

1.1 Our Results 

In this section, we summarize our results. The bounds we discuss here are informal and hide many 
of the parameters which we have not yet defined. The formal bounds are given in the main body 
of the paper. 

First, we provide an information theoretic characterization of the accuracy to which any algo- 
rithm (independent of computational constraints) can solve the heavy hitters problem in the private 
distributed setting. We say that an algorithm is a-accurate if it returns a universe element which 
occurs with frequency at most an additive a smaller than the true heavy hitter. In the centralized 
setting, a simple application of the exponential mechanism [MT07j gives an a-accurate mechanism 
for the heavy-hitters problem where a = 0(log |A^|), which in particular, is independent of the 
number of individuals n. In contrast, we show that in the fully distributed setting, no algorithm 
can be a-accurate for a = ^}{y/n) even in the case in which |A^| = 2. Conversely, we give an 
almost matching upper bound (and an algorithm with run-time linear in N) which is a-accurate 
for a = 0{\/n log A^). 

Next, we consider efficient algorithms which run in time only polylogarithmic in the universe 
size |A^|. Here, we give two algorithms. One is an application of a compressed sensing algorithm 
of Gilbert et al [GLPSIO] . which is a-accurate for a = (5(n^/^log A^loglog A^). Then, we give 
an algorithm based on group-testing using pairwise independent hash functions, which has an 
incomparable bound. Roughly speaking, it guarantees to return the exact heavy hitter (i.e. a = 0) 
whenever the frequency of the heavy hitter is larger than the ^2-norm of the frequencies of the 
remaining elements. Depending on how these frequencies are distributed, this can correspond to a 
bound of a-accuracy for a ranging anywhere between the optimal a = 0(-y/n) to a = 0{n). 
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1.2 Our Techniques 

Our upper bounds, both information theoretic, and those with efficient algorithms, are based on 
the general technique of random projection and concentration of measure. To prove our information 
theoretic upper bound, we observe that to find the heavy hitter, we may view the private database 
as a histogram f in dimensional space. Then, it is enough to find the index i G [N] of the 
universe element which maximizes {v,ei), where Cj is the i'th standard basis vector. Both v and 
each Ci have small ^i-norm, and so each of these inner products can be approximately preserved 
by taking a random projection into O(logA^) dimensional space. Moreover, we can project each 
individual's data into this space independently in the fully distributed setting, incurring a loss of 
only 0{^/n) in accuracy. This mechanism, however, is not efficient, because to find the heavy 
hitter, we must enumerate through all |A^| basis vectors Cj in order to find the one that maximizes 
the inner product with the projected database. Similar ideas lead to our efficient algorithms, 
albeit with worse accuracy guarantees. For example, in our first algorithm, we apply techniques 
from compressed sensing to the projected database to recover (approximately) the heavy hitter, 
rather than checking basis vectors directly. In our second algorithm, we take a projection using 
a particular family of pairwise-independent hash functions, which are linear functions of the data 
universe elements. Because of this linearity, we are able to efficiently "invert" the projection matrix 
in order to find the heavy hitter. 

Our lower bound separates the distributed setting from the centralized setting by applying an 
anti-concentration argument. Roughly speaking, we observe that in the fully distributed setting, if 
individual data elements were selected uniformly i.i.d. from the data universe A^, then even after 
conditioning on the messages exchanged with any differentially private algorithm, they remain 
independently distributed, and approximately uniform. Therefore, by the Berry-Esseen theorem, 
even after any algorithm computes its estimate of the heavy hitter, the true distribution over counts 
remains approximately normally distributed. Since the Gaussian distribution exhibits strong anti- 
concentration properties, this allows us to unconditionally give an ^l{^/n) lower bound for any 
algorithm in the fully distributed setting. 



1.3 Related Work 

Differential privacy was introduced in a sequence of papers culminating in |DMNS06 ^ , and has since 
become the standard "solution concept" for privacy in the theoretical computer science literature. 
There is by now a very large literature on this topic, which is too large to summarize here. Instead, 
we focus only on the most closely related work, and refer the curious reader to a survey of Dwork 
|Dwo0 8]. 

Most of the literature on differential privacy focuses on the centralized model, in which there is 
a trusted database administrator. In this paper, we focus on the local or fully distributed model, 
introduced by [KLN"'"08 , in which each individual holds their own data (i.e. there are n databases. 



each of size 1), and the algorithm must interact with each one in a differentially private manner. 
There has been little work in this more restrictive model-the problems of learning [KLN"'"08] and 
query release [GHRUlT] in the local model are well understooqj, but only up to polynomial factors 



^Roughly, the set of concepts that can be learned in the local model given polynomial sam ple comple xity is equal 
to the set of concepts that can be learned in the SQ model given polynomial query complexity [KLN^OSj . and the set 
of queries that can be released in the local model given polynomial sample complexity is equal to the set of concepts 
that can be agnostically learned in the SQ model given polynomial query complexity [GHRUlT] . but the polynomials 
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that do not imply tight bounds for the heavy hitters problem. The two-party setting (which is 
intermediate between the centralized and fully distributed setting), in which the data is divided 
between two databases without a trusted central administrator, was considered by [MMP+10| . They 
proved a separation between the two-party setting and the centralized setting for the problem of 
computing the Hamming distance between two strings. In this work, we prove a separation between 
the fully distributed setting and the centralized setting for the problem of estimating the heavy 
hitter. 

A variant of the private heavy hitters problem has been considered in the setting of pan-private 
streaming algorithms [DNP^loj IMMNWll] . This work considers a different (although related) 
problem in a different (although related) setting. [PNP^IO IMMNWll] consider a setting in 
which a stream of elements is presented to the algorithm, and the algorithm must estimate the 
approximate count of frequently occurring elements (i.e. the number of "heavy hitters"). In this 
setting, the universe elements themselves are the individuals appearing in the stream, and so it is 
not possible to reveal the identity of the heavy hitter. In contrast, in our work, individuals are 
distinct from universe elements, which merely label the individuals. Moreover, our goal here is to 
actually identify a specific universe element which is the heavy hitter, or which occurs almost as 
frequently. Also, [DNP^loj IMMNWll] work in the centralized setting, but demand pan-privacy, 
which roughly requires that the internal state of the algorithm itself remain differentially private. 
In contrast, we work in the local privacy setting which gives a guarantee which is strictly stronger 
than pan-privacy. Because algorithms in the local privacy setting only interact with individuals in 
a differentially private way, and never have any other access to the private data, any algorithm in 
the local privacy model can never have its state depend on data in a non-private way, and such 
algorithms therefore also preserve pan-privacy. Therefore, our upper bounds hold also in the setting 
of pan-privacy, whereas our lower bounds do not necessarily apply to algorithms which only satisfy 
the weaker guarantee of pan-privacy. 

Finally, we note that many of the upper bound techniques we employ have been previously 
put to use in the centralized model of data privacy i.e. random projections [BLROSl IBRll] and 
compressed sensing (both for lower bounds |DMT07] and algorithms [LZWYlT] ) . As algorithmic 
techniques, these are rarely optimal in the centralized privacy setting. We remark that they are 
particularly well suited to the fully distributed setting which we study here, because in a formal 
sense, algorithms in the local model of privacy are constrained to only access the private data using 
noisy linear queries, which is exactly the form of access used by random linear projections and 
compressed sensing measurements. 



2 Preliminaries 

A database v consists of n records from a data universe A^, one corresponding to each of n in- 
dividuals: for i £ [n], £ N and v = {v^,...,v^} which may be a multiset. Without loss of 
generality, we will index the elements of the data universe from 1 to |A^|. It will be convenient for 
us to represent databases as histograms. In this representation, v G N'^', where Vi represents the 
number of occurrences of the z'th universe element in the database. Further, we write G N'^' 
for each individual i £ [n], where Vj = 1 if individual i is associated with the j'th universe element, 
and v*/ = for all other / ^ j. Note that in this histogram notation, we have: v = Y17=i 

are not equal. 
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the following, we will usually use the histogram notation for mathematical convenience, with the 
understanding that we can in fact more concisely represent the database as a multiset. 

Given a database v, the heavy hitter is the universe element that occurs most frequently in the 
database: hh{v) = argmaxjgjv I'j- We refer to the frequency with which the heavy hitter occurs as 
fhh{v) = Vhh{yy We want to design algorithms which return universe elements that occur almost 
as frequently as the heavy hitter. 

Definition 2.1. An algorithm A is (a, /3)-accurate for the heavy hitters problem if for every 
database v G N'^', with probability at least 1 — /3: A{v) = i* such that Vi* > fhh{v) — a. 

2.1 Differential Privacy 

Differential privacy constrains the sensitivity of a randomized algorithm to individual changes in 
its input. 

Definition 2.2. An algorithm A : nI^I — > i? is (e, 5)-differentially private if for all v, v' G NI^I such 
that ||f — v'\\i < 1, and for all events S Q R: 

Pv[A{v) e S]< exp(e) Fi[A{v') £ S] + 6 

Typically, we will want 6 to be negligibly small, whereas we think of e as being a small constant 
(and never smaller than e = 0{l/n)). 

A useful distribution is the Laplace distribution. 

Definition 2.3 (The Laplace Distribution). The Laplace Distribution (centered at 0) with scale b 
is the distribution with probability density function Lap(x|6) = ^ exp ^^^^ sometimes 

write Lap(f)) to denote the Laplace distribution with scale b, and will sometimes abuse notation 
and write Lap(6) simply to denote a random variable X ~ Lap(&). 

A fundamental result in data privacy is that perturbing low sensitivity queries with Laplace 
noise preserves (e, 0)-differential privacy. 

Theorem 2.4 f jDMNSnOj ). Suppose Q : N'^I M is a function such that for all databases 
v,v' G N'^I such that \\v — v'\\i < 1, \Q{v) — Q{v')\ < c. Then the procedure which on input v 
releases Q{v) + X, where X is a draw from a Lap{c/e) distribution, preserves {e^Q)- differential 
privacy. 

It will be useful to understand how privacy parameters for individual steps of an algorithm 
compose into privacy guarantees for the entire algorithm. The following useful theorem is a special 
case of a theorem proven by Dwork, Rothblum, and Vadhan: 

Theorem 2.5 (Privacy Composition [DRVlOj ). Let {) < e,5 < I, and let Mi,..., Mr be (e',0)- 
differentially private algorithms for some e' < e/ \JsT\og (^). Then the algorithm M which on 
input V outputs M{v) = {Mi{v), . . . ,Mt{v)) is {€,6) -differentially private. 

The local privacy model (alternately, the fully distributed setting) was introduced by Ka- 
siviswanathan et al. KLN^OS] in the context of learning. The local privacy model formalizes 



randomized response: there is no central database of private data. Instead, each individual i main- 
tains possession of their own data element (i.e. a database of size ||v*||i = 1), and answers 
questions about it only in a differentially private manner. Formally, the database v G N'^' is the 
sum of n databases of size 1: v = Y17=i each t>* is held by individual i. 
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Definition 2.6 ( [KLN+08] (Local Randomizer)). An (e, 5)-local randomizer R : N'^' — )■ i? is an 
(e, 5)-differentially private algoritlim that takes a database of size = 1. 

In the local privacy model, algorithms may interact with the database only through a local 
randomizer oracle: 

Definition 2.7 f |KLN+08] (LR Oracle)). An LR oracle LR^{-,-) takes as input an index i S [n] 
and an (e, (5)-local randomizer R and outputs a random value w G R chosen according to the 
distribution R{v^), where is the element held by the i'th individual in the database. 

Definition 2.8 ( (KLN^OS] (Local Algorithm)). An algorithm is (e, 5)-local if it accesses the 
database v via the oracle LR^, that satisfies the following restriction: if LR^(i, Ri), . . . , LR^{i, i?^) 
are the algorithm's invocations of LRy on index i, then the joint outputs of each of these k algo- 
rithms must be (e, (5)-differentially private. 

To avoid cumbersome notation, we will avoid the formalism of LR oracles, instead remembering 
that for algorithms in the local model, any operation on must be carried out without access to 
any for j ^ i, and must be differentially private in isolation. 

2.2 Probabilistic Tools 

We will make use of several useful probabilistic tools. First, the well-known Johnson-Lindenstrauss 
lemma: 

Theorem 2.9 (Johnson-Lindenstrauss Lemma). Let < 7 < 1 &e given. For any set V of q vectors 
in MJ^ , there exists a linear map A : — R*" with m = O ^^^^^ such that A is approximately 
an isometric embedding ofV into R™". That is, for all x,y £V, we have the two bounds 

(l-7)||x-yf < \\A{x-y)f < {1 + j)\\x - yf 
\{Ax,Ay) - {x,y)\ < 0{j{\\xf + \\yf)) 

In particular, any m x N random projection matrix Ap, whose entries are drawn IID uni- 
formly from { — l/-^/m, enjoys this property with probability at least 1 — f3, with m = 
O ^!2i-ii^|(lZi^^ . Note that this projection matrix does not depend on the set of vectors V. 

In other words, any set of q points in a high dimensional space can be obliviously embedded 
into a space of dimension 0{\ogq) such that w.h.p. this embedding essentially preserves pairwise 
distances. 

In our analysis, we will also make use of a simple tail bound on the sums of Laplace random 
variables: 

Theorem 2.10 (See, e.g. [GRU12j ). LetXi,i G [n] be IID random variables drawn from the Lap(6) 
(the Laplace distribution with parameter b) and let X = 'Ylll=i-^i- Then, we have the bound 

Pr[x>r]<| ^^-"^ 

I exp(-6b) ■.T>nb 
In particular, choosing Tp = 6\/6nlog(2//3) gives 

Pr[|X| <Tp\>l-P 
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3 Information Theoretic Upper and Lower Bounds. 



In this section we present upper and lower bounds on the accuracy to which any algorithm in the 
fully distributed model can privately approximate heavy hitters. Our upper bound can be viewed 
as an algorithm, albeit one that runs in time linear in |A^| and so is not what we consider to be 
efficient. 



3.1 An Upper Bound via Johnson-Lindenstrauss Projections 

We present here our first algorithm, referred to as JL-HH, that solves the heavy hitters prob- 
lem in the local model using the Johnson-Lindenstrauss lemma. The algorithm JL-HH is out- 
lined in Algorithm [l|. We write Cj to refer to the i'th standard basis vector in M^, and write 
RandomProjection(m, + 1) for a subroutine which returns a linear embedding of A + 1 points 
into m dimensions using a random zizl/^/m valued projection matrix, as specified by the Johnson- 
Lindenstrauss lemma. By the Johnson-Lindenstrauss lemma, for any set of A + 1 elements, this 
map approximately preserves pairwise distances with high probability. 

Algorithm 1 JL-HH Mechanism 

Input: Private histograms G N^,i G [n]. Privacy parameters e,6 > 0. Failure probability 
/3 > 0. 

Output: p* , index of the heavy hitter. 
7 ^ 

^ log(jV+l) log(2//3) 

A ^ RandomProjection(m, A^ + 1) 
for p = 1 to N indices do 
for z = 1 to n users do 
~ |Lap (^^/^^ 

= Av^ + 
fip — {Acp, q^) 
end for 
Cp <— 2^j=i rip 
end for 

p* argmaXp Cp 
return p* 



JL-HH is based on the following straightforward idea. If v is a private histogram, we will 
estimate the count o f the z'th element {{v,ei)), by estimating {Av,Aei), and returning the largest 
count. By Theorem b.ol . since we are using the random projections matrix, we have that with 
high probability, inner products between points in the set V = {ei ■■■eN,v} are approximately 
preserved under A. However, we cannot access Av directly since v is private data. To preserve 
differential privacy, our mechanism must add noise z to Av, and work only with the noisy samples. 
Our analysis will thus focus on bounding the error introduced by this noise term. First, though, 
we show that JL-HH is differentially private. 

Lemma 3.1. JL-HH operates in the local privacy model and is {e, 5) -differentially private. 
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Proof. The measurement Av is computed in the fully distributed setting, by computing Av 
^"^^ Av'^ + . Each individual i may compute Av^ + which cor resp onds to answering a sequence 



of m linear queries, each with sensitivity l/-y/m. By Theorem ylA , the noise that JL-HH adds 
guarantees that each such query is eo-differentially private, with 



y^Sm log(l/5) 



Thus, by Theorem |2.5| . this composition is (e, (5)-differentially private, as desired. From here, the 
algorithm works with the noised measurement instead of private data, and is therefore differentially 
private. □ 



Now, we show that JL-HH estimates the counts to within an additive error of O 



Vn log N 



Theorem 3.2. For any /3 > 0, JL-HH mechanism is {a, f3)- accurate for the heavy hitters problem, 
^,tha = 0(^^^^^I^^^ 



Proof. Let v be the private histogram, and let z = X^iLi denote the sum of the noise vectors 
added to each individual's data . The error of the mechanism is at most 

2 max \ {ei,v) — (Aei , Av + z) I 

i&[N] 

Note that for all j, the random variable Zj is distributed as the sum of n i.i.d. Laplace random 
variables each with scale b = -y/8 log 1/S/e. To calculate the error for an index i, we may write: 

\{ei,v) - {Aei,Av + z)\ < \{ei,v) - {Aei,Av)\ + 1(^6^,^)1 (1) 

= 0{^\\v\\' + \{Ae.,z)\) (2) 



with the second equality following from Theorem |2. 3 . Recall that we have set 7 = n~^, and let A be 
the random projection matrix, with m = 0(log A^log(2//3)/7^). With probability at least 1 — /3/2, 
the random projections matrix A actually satisfies the property for the Johnson-Lindenstrauss 
lemma. So, we have 

m n 

{Ae„z) = ^{Ae^)j^zi 
j=i i=i 

But Aei is a vector of length m with entries drawn uniformly from zizl/y/m. Since the Laplace 
distribution is also symmetric, the distribution of this sum is identical to the distribution of a 

sum of mn i.i.d. Laplace random variables each with scale b = ^^^J^^ -. By our tail bound in 

Vnlog(l/^)log(Af/"^ ^ 



Theorem I2. id . with probability at least 1 — f3/2N, this sum is bounded by O 



On the other hand, the other error \{ei,v) — {Aei,Av)\ can be bounded by Equation (12), and 
hence is 0(1) by our choice of 7. Thus, with probability at least 1 — I3/2N, we have that the 

estimated count for index i is within an additive factor of O ( ^°g(^//^ T \ ^^^^ true count 



of index i. Taking a union bound over all indices, we have that with probability at least 1 — (3/2, 
this accuracy holds for the heavy hitter, and all other elements. Since the probability of failing 
when picking A was at most /?/2, this gives the desired high probability bound. □ 
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3.2 A Lower Bound via Anti-Concentration 



Here we show that our upper bound in the previous subsection is essentially optimal: for any 
e < 1/2 and any 6 bounded away from 1 by a constant, no (e, 5)-private mechanism in the fully 
distributed setting can be a-accurate for the heavy hitters problem for some a = Q{^/n), even 
in the case in which |A^| = 2. Our theorem follows by arguing that even after conditioning on 
the output of the differentially private interaction with each individual in the local model, there 
is still quite a bit of uncertainty in the distribution over heavy hitters, if the universe elements 
were initially distributed uniformly at random. We take advantage of this uncertainty to apply an 
anti-concentration argument, which implies that no matter what answer the algorithm predicts, 
there is enough randomness leftover in the database instance that the algorithm is likely to be 
incorrect (with at least some constant probability /3). We remark that our technique (while specific 
to the local privacy model) holds for (e, 5)-differential privacy, even when 6 > 0. This is in con- 
trast to techniques for proving lower bounds in the centralized model, such as the elegant packing 



argument of |HT10j . which are specific to (e, 0)-differential privacy. We note that |MMP+in] used 
an independence argument, which is similar in spirit, to prove a lower bound on computing the 
Hamming distance between two strings in the two-party setting. 

Theorem 3.3. For any e < 1/2 and 6 < 1 bounded away from 1, there exists an a = ^}{^/n) and a 
/3 = 0,(1) such that no {e,6)-private mechanism in the local model is [a, (3) -accurate for the heavy 
hitters problem. 

Proof. We give a lower bound instance in which the universe is = {0, 1}. Each individual i is 
assigned a universe element Si G {0, 1} uniformly at random. Let Ai : N ^ Ai denote the (e, 6)- 
differentially private algorithm which acts on the data Si of individual and write mi = Ai{si). 

We condition on the order of the parties that we query and on the output of each algorithm, 
rrii = rhi for fixed rhi € Ai. 

We first observe that conditioning on the outputs of each Ai: rrii = rhi for each i, the random 
variables Sj remain independent of one another. (This is a standard fact from communication 
complexity) 

We next argue that under this conditioning, the marginal distributions of a constant fraction 
of the Si variables remain approximately uniform. If we define the random variables Xi to be the 
indicator of the event Si = Si (conditioning on all the messages), we can apply Bayes' rule to get 
for all i G [n]: 

Pr[Xj = Si] = Pr[sj = Si\mi = rhi] 

Pr[mi = rhi\si = Si] Pr[sj = Si] 



< 



Pr[mi = rhi] 
Fr[mi = mi\si = Si] Pr [sj 
Pr[mj = rhi\si = b] 



where b is some element of the universe. Because each Ai is (e, (5)-differentially private, we have 
that with probability at least 1 — 5, the following random variable (where the randomness is over 
the choice of rhi) is bounded: 

Fr [rui = mi\si = Sj] ^ ^ 
Pr[mj = rhi\si = b] ~ 

and thus with probability 1 — 5 over the choice of ifij: Pr[Xj = Si] < {e^)/2, using the prior on Sj. 
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In similar fashion, wc can prove a lower bound on the probability. So, we have that for each i 
independently with probability at least 1 — 5: Pr[Xj = Sj] G [{e^^)/2, (e'^)/2]. Because we assume 
e < 1/2, we therefore have for each i independently with probability 1 — 6: Pj:[Xi = Si] G [ci,C2] 
where ci,C2 are constants bounded away from and 1 respectively. Because this occurs with 
constant 1 — 5 probability for each i, for any constant /3, wc can (by the Chcrnoff bound) take n 
to be sufficiently large so that except with probability /3/2, we have Pr[Xj = Sj] € [ci,C2] for $7(n) 
individuals i. This, together with the conditional independence of the X^'s, allows us to apply the 
Berry-Esseen theorem: 

Theorem 3.4 (Berry-Esseen). Given independent random variables Xi, i G [n], let jjLi = E[Xj], a? = 
E[(Xi - Hi)%Pi = E[|Xi - and let 

i~>n — ' 



If Fn is the cdf of Sn, and $ is the cdf for the standard normal distribution, then there exists a 
constant C such that 

sup \Fn{x) - $(a;)| < CV' 



\ -1/2 

a.; max 



where 

(n 
E 

For each of the 0(n) individuals i for which Pr[Xj = 1] G [ci,C2], each af and /?j is a constant 
bounded away from 0. Thus, we have with probability at least /3/2: < 0{l/^/n), and hence 
the cdf Fn of the sample mean Sn converges uniformly to the normal distribution. By a change 
of variables, this means that the cdf of the sum Y17=i{-^i ~ f-i) converges to the cdf of a normal 
distribution with mean and variance = ^27=1 ~ ^{f^)- The next lemma lower bounds the 
probability that Sn is within an additive factor of il(-^/n) of its mean. 

Lemma 3.5. Let /3 > be given and condition on the event that Pr[Xj = 1] G [ci,C2] for ri(n) 
individuals i £ [n]. For sufficiently large n, there exists a constant C such that 



Pr 



^{Xi - Hi 



i=l 



> 1 - /3/2 



of Lemma. This is immediate, since by the Berry-Esseen theorem the sum Yli^=i{-^i ~ l^i) converges 
uniformly to a Gaussian distribution with standard deviation a = ^t{^/n). □ 

To complete the proof, we note that the distribution of ni = Y17=i -^i simply the distri- 
bution of the number of occurrences of universe element 1, after conditioning on the outcome of 
differentially private mechanisms ^i, . . . ,An. Consider a mechanism, which given the outcome of 
mechanisms Ai, . . . ,An attempts to guess the value of ni, and outputs hi. Let = fJ^i. By 

the properties of the Gaussian distribution we have: 

Pr[|ni - nil < i] < Pr[|ni - n\<t\ 
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for all values of t. In particular, for some t = Cy/n we have shown that this probability is at most 
(3. In other words, we have shown that for some constant /3 > and for some a = Q{y/n), there is 
no (e, 5)-private algorithm in the local model which is able to estimate the frequency of the heavy 
hitter to within an additive a factor with probability 1 — /3. It is straightforward to see that there 
therefore cannot be an (a, /3)-accurate, (e, 5)-private mechanism for the heavy hitters problem: any 
such mechanism could be converted to a mechanism which estimates the frequency of the heavy 
hitter by introducing "dummy" individuals corresponding to the universe element which is not the 
heavy hitter, and performing a binary search over their count by computing the identity of the 
heavy hitter in each dummy instance. The count at which the identity of the heavy hitter in the 
dummy instance changes can then be used to estimate the frequency of the true heavy hitter. 

□ 

4 Efficient Algorithms 

In the last section, we saw the Johnson-Lindenstrauss algorithm which gave almost optimal accuracy 
guarantees, but had running time linear in |A^|. In this section, we consider efficient algorithms 
with running time poly(n, log |A^|). The first is an application of a sublinear time algorithm from 
the compressed sensing literature, and the second is a group-testing approach made efficient by the 
use of a particular family of pairwise-independent hash functions. 

4.1 GLPS Sparse Recovery 

In this section we adapt a sophisticated algorithm from compressed sensing. Gilbert, et al. 
[GSTVOT] present a sparse recovery algorithm (we refer to it as the GLPS algorithm) that takes 
linear measurements from a sparse vector, and reconstructs the original vector to high accuracy. 
Importantly, the algorithm runs in time polylogarithmic in |A^|, and polynomial in the sparsity 
parameter of the vector. We remark that our database v is n-sparse: it has at most n non-zero 
components. In the rest of this section, we will write Vs to denote the vector v truncated to contain 
only its s largest components. 

Let s be a sparsity parameter, and let 7 be a tunable approximation level. The GLPS algorithm 
runs in time 0((s/7) log'^ A^), and makes m = 0{slog{N/s)/j)) measurements from a specially 
constructed (randomized) {—1,0,1} valued matrix, which we will denote Given measurements 
u = + z (where z is arbitrary noise), the algorithm guarantees an error bound (with probability 
at least 3/4): 

11^^ - ■OII2 < (1 + 7)11^^ - Vsh +7log(s)-^^^ (3) 

K 

with K = 0(log2(s)log(iV/s)) 

Though the GLPS bound only occurs with probability 3/4, the success probability can be made 
arbitrarily close to 1 by running this algorithm several times. In particular, using the amplification 
lemma from [GLM^IO] . the failure probability can be driven down to /? at a cost of only a factor 
of log(l//3) in the accuracy. In what follows, we analyze a single run of the algorithm. 

Next, we will show that GLPS-HH is (e, (5)-differentially private. 

Theorem 4.1. GLPS-HH operates in the local privacy model and is {€,5) -differentially private. 
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Algorithm 2 GLPS-HH Mechanism 



Input: Private histograms G N^,i G [n]. GLPS matrix <I>. Privacy parameters e,6 > 0. 
Output: p* , estimated index of heavy hitter. 
b ^ v^8mlog(l/(5)/e 
for i = 1 to n users do 
z' ~ {Lap(6)}™ 
<— * + 
end for 

V ^ GLPS{c,^) 
p* ^ argmaXp Vp 
return p* 



Proof. The algorithm operates in the local privacy model because each individual i compute ^v^+z^ 
independently, which corresponds to answering m linear queries, each with sensitivity 1. The 
magnitude of the Laplace noise added, z*, is then sufficient (by Theorem |2.5| ) to guarantee (e,(5)- 
differential privacy for each individual. □ 

Next, we will bound the error that we introduce by adding noise for differential privacy. 

Theorem 4.2. Let 13 > be given. GLPS-HH is {a, 3/4— /3)- accurate for the heavy hitters problem, 
with 

' n^l^ log^/3 (1//3) log N log log N log^/^ (1/5) \ 



a = 



cl/3 



Proof. Let b = y^8m log(l/5)/e. Let v denote the combined private database, and let v denote the 
estimated private database returned by GLPS. GLPS-HH uses the GLPS algorithm with measure- 
ments c = - j- z, z = X^jZ*, where the noise vector z has each entry drawn from X^iLiLap(6). 
From Theorem 2. id . we have the bound (for a fixed index i) 

Pr[|zi| < 0{bV7ilog{m/f3))] > 1 - /3/m 

Taking a union bound over all m indices, we find this bound holds over all components with 
probability at least 1 — /3. Thus we can bound 



\z\\2 < 0{b^/nmlog{m/ (3)) = O 



: log{N/s) \og{s log{N/s)/l3)^nlog{l/6) 



With probability 3/4, we have the GLPS bound Equation (0), from which we can estimate 



\V - v\\oo < \\V - v\\2 < \\V - Vsh + 



ilog(iV/s)log(slog(iV/s)//3)Vrilog(l/5) 



By a Lemma from j GSTVOT] . we have \\v — Vs\\2 ^ ll^l|i/v^- Now, in the worst case, 
0(n), and we need to choose s to balance the errors in 
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n 

\v - v\\oo < ^ + 



_ s logjN/s) log(g log{N/s)/(5) y/n log(l/(5) \ 



By setting s to be: 

2/3 



S = 



n 



log(l//3)Vlog(l/5) 



when we get an error bound 

' n5/6 log V3 (1/^) log N log log N log ^6 (1/(5) \ 



\v - wlino < O 
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, gl/3 

Thus, with probability at least 3/4 — ^, we get the desired accuracy. □ 
4.2 The Bucket mechanism 

In this section we present a second computationally efficient algorithm, based on group-testing and 
a specific family of pairwise independent hash functions. 

Algorithm 3 The Bucket Mechanism 

Input: Private labels G G [n]. Failure probability /3 > 0. Privacy parameters e,6 > 0. 

Output: p* , the index of the heavy hitter. 
{0,l}'°s^\0 
for z = 1 to 81og(l//3) trials do 

H G {0, i}iogi2AfxiogAf ^ Y>^^^ log rows from F, uniformly at random. 
u G Rl°si27V ^ 

for j = 1 to n users do 

b G {0, l}i°g^ ■ir- binary expansion of . 
s^Hh (mod 2) 

z ~ |Lap ^ V°s(i2^)'°g(i//3)iog(i/^) ^ |iogi2iv 

u u + s + z 
end for 

for /c = 1 to log 12A'^ hash functions do 
^ ^ r 1 : Ufc > n/2 
^ 1 : otherwise 
end for 

xq : Hxq = b (mod 2) 
_L : Hx = b (mod 2) infeasible 



end for 

w* most frequent Wi, ignoring _L 
return p* <— w* converted from binary 



At a high level, our algorithm, referred to as the Bucket mechanism, runs 0(log(l//3)) trials 
consisting of O(logA^) 0/1 valued hash functions in each trial. For a given trial, the mechanism 
hashes each universe element into one of two buckets for each hash function. Then, the mechanism 
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tries to find an element that hashes into the majority bucket for all the hash functions. If there is 
such an element, it is a candidate for the heavy hitter for that trial. Finally, the mechanism takes 
a majority vote over the candidates from each trial to output a final heavy hitter. 

For efficiency purposes we do not use truly random hash functions, but instead rely on a 
particular family of pairwise-independent hash functions which can be expressed as linear functions 
on the bits of a universe element. Specifically, each function h in the family maps [N] to {0, 1}, and 
is parameterized by a bit-string r G {0, In particular, given any bit-string r G {0, 
we define hr{x) = {r, b{x)), where b{x) denotes the binary representation of x. r is chosen uniformly 
at random from the set of all strings r G {0, l}'°sl^l \ 0^°sl^L Given hash functions of this form, 
and a list of target buckets, the problem of finding an element that hashing to all of the target 
buckets is equivalent to solving a linear system mod 2, which can be done efficiently. Our family of 
hash functions operates on the element label in binary, hence the conversions to and from binary 
in the algorithm. 

We will now show that the bucket mechanism is (e, (5)-differentially private, runs in time 
poly(n, log I A'^l), and assuming a certain condition on the distribution over universe elements, re- 
turns the exact heavy hitter. The accuracy analysis proceeds in two steps: first, we argue that with 
constant probability > 1/2, the heavy hitter is the unique element hashed into the larger bucket 
by every hash function in a given trial. Then, we argue that with high probability, the proceeding 
event indeed occurs in the majority of trials, and so the majority vote among all trials returns the 
true heavy hitter. 

Theorem 4.3. The Bucket mechanism operates in the local model and is (e, 5) -differentially private. 
Proof. Each party answers log 12A^ 1-sensiti ye q ueries about only their own data for each trial. 



with a total of 81og(l//3) trials. By Theorem |2.5| . the correct amount of noise is added to preserve 



(e, 5)-difFerential privacy. □ 

Theorem 4.4. For fixed e, (5 > and failure probability f3 > 0, the Bucket mechanism runs in time 
0(n log(l//3)log3A^). 

Proof. The step that dominates the run time is the inner loop over each party. For each user, the 
algorithm evaluates O(logA^) hash functions. Each evaluation calculates the inner product of two 
log A''-length bit strings, and there are O(logA^) hash functions. So, each user takes time log^ 
per trial. With n users and 0(log(l//3)) trials, the result follows. □ 

We first prove a simple tail bound on sums of /c-wise independent random variables, modifying 
a result given by Bellare and Rompel, |BR94j . 

Lemma 4.5. Let k be even. Take a k-independent set of random variables Xi, with < < q, 
letX = Yl Xi, and let fi = E[X]. We have: 



Ft[\X -t^\>t]<Ck 

with c = Ec-, and Ck = 2\/^e''/2-i/{6fe) < i.0004. 
Proof. By Markov's inequality, we can write: 



ck\ 



k/2 



Pr[|X - /.I > t] = Pr[(X - > t'^] < 



14 



However, if we expand out the product, we find that we only need to consider the expected 
value of products of at most k of the variables Xj. Thus, without loss of generality, we may consider 
Xi to be independent for the following calculation. 



poo 

E[(X - n)^] = / Fi[{X - iif > s\ds 
Jo 

POO 

= / Pr[|X - /i| > s^/'']d6 
Jo 



<-L 



where we have used that the Xi are independent in order to applied Azuma's inequality. By a 
change of variables, and letting c = ^ c|, we have 

/•oo 

Jo 

= (2c)^'/2fer(fe/2 - 1) 
= 2(2c)^/2 Q)! 

where we have used Stirling's approximation in the last step. Now, we get 

as desired. □ 

Lemma 4.6. Let I3,e,6 > be given, and consider a single trial in the Bucket mechanism. Without 
loss of generality, suppose that the elements are labeled in decreasing order of count, with counts 
vi > V2 > ■ ■ ■ > V]\f. Write c = Y1^2 number of hash functions per trial, and k2 be 

the number of trials. If we have the condition 



VI > 2^ 

where b is the parameter for {e, 6) -differential privacy: 



+ K^i' ^2)^6^ log 



then with probability at least 1 — 2/3/3, the heavy hitter is hashed into the larger bucket for each 
hash function in the trial. 
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Proof. First consider a single hash function. If we define random variables Xi,i £ [N] by: 



Xi 



Vi : lis hashed to bucket 1 
: otherwise 



and the function f{X2, ■ ■ ■ ,Xj\[) = YliL2-^i^ show that the true heavy hitter will be hashed to 
the larger bucket (with high probability) if / does not deviate from the mean by too much. If / 
is close to the mean, then no matter which bucket the heavy hitter is hashed to, that will become 
the larger bucket. However, we will need to keep track of the noise that will be added to preserve 
differential privacy. We want vi t o be large enough to overcome the noise (with high probability). 



More precisely, by Theorem |2.10| . the sum of n Laplace noise terms will be bounded by 



6v6nlog(6A;i//3), with probability at least 1— /3/3fci. We also know that the collection {X2, •• • ,Xn} 



is a pairwise-independent set of random variables, so applying Lemma |4.5| with X = f, and 



we have that 



Pr[|/-H>i]<C.('|')<4(^^ ^ 



- uy 3ki 



with C2 a constant from Lemma I4.5l . The difference between the counts in the two buckets will be 
2|/ — so for the heavy hitter to be hashed to the larger bucket, we need vi > 2\f — fi\ + \z\, where 
z is the Laplace noise term, with high probability. Taking a union bound over ki hash functions, 
we have that 



2\f - fi\ + \z\ < 2^ 



-^ + 6^ log ^— J 



holds for all the hash functions in this trial with probability at least 1 — 2/3/3. But by assumption, 
vi is larger than this gap, and so we are done. □ 

Lemma 4.7. Let the notation be as in the previous Lemma, and consider a single trial in the Bucket 
mechanism. If we set ki = log (^^^ > then with probability at least 1 — /3/3, no other element will 
be hashed to the same bucket as the heavy hitter through all the hash functions. 

Proof. Pick any element g besides the heavy hitter, and consider a single hash function. Since the 
hash function is pairwise-independent, conditioning on where the heavy hitter is hashed will not 
change the marginal for where g will be hashed. Thus, there is a 1/2 chance of g colliding with 
the heavy hitter for any given hash function. Since the hash functions are drawn independently at 
random, the chance of this collision happening on every function is (1/2)^=1 = f3/{3N), by choice 
of ki. Taking a union bound over the — 1 elements besides the heavy hitter, we have that this 
collision probability for all elements is bounded by /3/3, as desired. □ 

Now, we are ready to put everything together. 

Theorem 4.8. Let the notation be as in the previous Lemma. If we set ki = log 12A^, /c2 = 
81og(l//3), and if we have the condition 



Vl 



, — 8 log(24 log 12N)J6n log 12iV log(l//3) log(l/(5) 

> 8x/2clogl2A^ + ^-^ ' ' 
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then the Bucket mechanism is {0, /S)- accurate for the heavy hitters problem. 



Proof. First, note that ki and the condition have been chosen so that from Lemmas 14. 61 and I4.7I . 
for any single trial, the heavy hitter is always hashed to the larger bucket, and is the unique such 
element, with probability at least 3/4. These two conditions ensure that we are able to correctly 
identify the heavy hitter with probability 3/4 for a single trial. Now, as the trials are independent, 
we apply a Chernoff bound to show that out of k2 Bernoulli variables with success probability 3/4, 
the probability that at least half of them succeed is bounded below by 

Pr [Majority Vote Success] > 1 - e-2'=2(i/4)2 = 

by our choice of k2- Thus, the Bucket mechanism returns the true heavy hitter with probability at 
least 1-/3. □ 

We note that the accuracy guarantee of the bucket mechanism is incomparable to those of 
our other mechanisms. While the other mechanisms guarantee (without conditions) to return an 
element which occurs within some additive factor a as frequently as the true heavy hitter, the bucket 
mechanism always returns the true heavy hitter, so long as a certain condition on v is satisfied. 
When the condition is not satisfied, the algorithm comes with no guarantees. The condition is 
roughly that the heavy hitter should occur more frequently than the ^2-iiorm of the frequencies 
of all other elements. Depending on the distribution over elements, this condition can be satisfied 
when the heavy hitter occurs with frequency as small as 0{^/n), or can require frequency as large 
as 0(n). Finally, we note that this condition is not unreasonable. It will, for example, be satisfied 
with high probability if the frequency of the database elements is drawn from a Zipf distribution, 
as frequencies often times are. 



5 Discussion and Open Questions 

We have initiated the study of the private heavy hitters problem in the fully distributed (local) 
privacy model. We have provided an (almost) tight characterization of the accuracy to which the 
problem can in principle be solved. In particular, we have separated the local privacy model from 
the centralized privacy model: we have shown that even the easier problem of simply releasing the 
approximate count of the heavy hitter cannot be accomplished to accuracy better than ^}{^/n) in the 
local model, whereas this can be accomplished to 0(1) accuracy in the centralized model. We have 
also given several efficient algorithms for the heavy hitters problem, but these algorithms do not 
in general achieve the tight 0(Y^nlog |A^|) accuracy bound that we have established is possible in 
principle. We leave open the question of whether there exist efficient algorithms in the local model 
which can solve the heavy hitters problem up to this information theoretically optimal bound. 
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